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Abstract. We discuss a classical reinterpretation of quantum-mechanics-based analysis 
of classical Markov chains with detailed balance, that is based on the quantum-classical 
correspondence. The classical reinterpretation is then used to demonstrate that it successfully 
reproduces a sufficient condition for cooling schedule in classical simulated annealing, which has 
the inverse-logarithm scaling. 



1. Introduction 

Theoretical properties of simulated annealing [I] have been extensively studied in the 1980s [21 [3]. 
One of the main issues in those research activities was regarding the annealing schedule: How 
should one decrease temperature T(i) as a function of time t in order to finally arrive at a 
globally optimum solution? Geman and Geman [1] were the first to obtain an answer, which 
states a sufficient condition of the form T(t) > 0(1/ log t). The inverse-logarithm scaling turned 
out to be universal, in the sense that it is also sufficient for many variants of simulated annealing 
and some other algorithms. Hajek [5] proved a necessary and sufficient condition which also has 
the inverse-logarithm form, showing that one cannot do the cooling any faster than that while 
guaranteeing global optimality. 

Somma et al., in their recent contribution [6j, have shown that the inverse- logarithm scaling 
of simulated annealing can also be obtained via the adiabatic condition [7] of a related quantum- 
mechanical system. The relationship between the original Markov chain in simulated annealing 
and the quantum system is established via the so-called classical-quantum mapping or quantum- 
classical correspondence [HI [10] - In this paper, we discuss a classical reformulation of quantum 
equivalent of a classical Markov chain with detailed balance, in order to elucidate mathematical 
structure of the correspondence between a Markov chain and its quantum equivalent, without 
making reference to quantum mechanics. We also discuss another classical reformulation of the 
argument deriving the optimal inverse-logarithm scaling of annealing schedule [6] (see also [llj). 
which is based on the quantum adiabatic theorem. 

This paper is organized as follows. In section 2 we first provide a basic formulation of 
classical Markov chains with detailed balance, and derive its a-representation. A local linear 
approximation of the time evolution in terms of a-representation is also discussed. Our derivation 
of the inverse-logarithm scaling of simulated annealing is discussed in section 3. A "chasing" 
view of simulated annealing, that is based on the local linear approximation based on the 
O-representation, and a bound of the largest negative eigenvalue are used in the derivation. 



In section 4 we discuss relation between our formulation and the stochastic matrix form 
decomposition, which is defined and discussed extensively in [SJ. Section 5 concludes the paper. 



2. Basic formulations 

2.1. Markov chains 

Let S denote a state space, which is a finite set of cardinarity N. Let E be an "energy function" 
defined on S, which associates a state i G S with its energy E%. Then, one can define a probability 
distribution on S, in terms of a probability vector p = (pi), as 

Pi = —i Z = ^e~^, (1) 

which is the Gibbs-Boltzmann distribution induced by the energy function E, with (3 > a 
parameter corresponding to the inverse temperature. 

Let us consider a undirected graph G with S its vertex set and an edge set L. We assume G 
to be a connected graph, without self-edge (loop). We define a transition matrix M = (rriij) as 

r wnePM-W* ((ij)eL) 

™*j = S - E*: (i*) 6 L (* = J") ' ( 2 ) 

^ (otherwise) 

where W = (wij) is a symmetric matrix with Wij > for (ij) £ L. On the basis of the transition 
matrix M, one can define a continuous-time Markov chain, as 

p = Mp. (3) 

The connectedness of the graph G induces irreducibility of the Markov chain. The Markov chain 
is also aperiodic, so that it is ergodic, and therefore bears a unique equilibrium distribution. 
The Gibbs-Boltzmann distribution (pQ) is the equilibrium distribution of the Markov chain, since 
Mp = holds. 

The formulation presented here is general, including various typical systems as special cases. 
For example, conventional Ising spin systems are described by letting S = {—1, 1}™ with iV = 2 n 
and L having an n-dimensional hypercubic structure. Metropolis and Glauber dynamics are 
implemented by letting 

Wij oc max p^i-W e m~ E ^ 2 } , (4) 

and 

Wij ^ e /3(B,-B s )/2 +e /3(E i -E j )/2' ^ 

respectively, as mentioned in [9]. 

2.2. a-representation 

We discuss a different representation of the continuous-time Markov chain ([3]), in view of the 
classical-to-quantum mapping utilized in [6]. Although the quantum reformulation mapped 
from a classical Markov chain makes use of square roots of probabilities {y/fh}, we here discuss 
a slightly more generalized expression which is based on the so-called a-representation of p. 
Definition. We define the a-representation = (i^\°^) of p as 



m = i — -pi ■ (6) 

1 — a 



The concept of a-representation is originally introduced in information geometry [12\ I13j. in 
order to discuss intrinsic geometrical structures of statistical manifolds. Taking square roots 
of probabilities corresponds to considering O-representation. Although not used in this paper, 
1-representation is defined as 

4 1] =^g Pi . (7) 
We next derive an expression of the Markov chain in terms of the a-representation. One has 

which is rewritten, in a vector-matrix form, as 

^(«) = (9) 

where the matrix H ^ is defined as 

H (a) = ( $ («))-l M$ («) ] ( 10 ) 

with ty( a ^ = diag(^ Q ' ) ). Clearly, eigenvalues of the matrix ijM are the same as those of M. 
The elements of the matrix H^°^ = [h\, ) are given by 



h { ~ a) 



-E k -. m eL^e^- E ^ 2 (i = j) > ( n ) 

(otherwise) 



and consequently, 



-(-a) (-a), _ 

ij ~ ij >P = P 



WijC 



J a/3(E i —Ej)/2 G L ) 



J2 k :meL^ E >- Ek)/2 (< = j) ■ (12) 
(otherwise) 



The above expression evidently shows that the O-representation is special in our formulation, 
in that the matrix H^~°^ = H^ a ^\ p= p becomes symmetric when a = 0, that is, under the 
O-representation. The fact that the O-representation symmetrizes the transition matrix M was 
also mentioned in p3], in order to state that eigenvalues of M are all real. It should be noted 
that the matrix #( _Q ) is dependent on i/>( a ) via and therefore does not symmetric 

at p ^ p in general. 

2.3. Time evolution 

We discuss linearization of the a-representation of the dynamical equation. Starting from the 
nonlinear dynamics 



and considering a small perturbation 5i^^ around we obtain the following linearized 

system which describes time evolution of 6tp^: 



/ i \\ -(l+«)/(l-«) / , w (l+a)/(l— a) i n 

s4 a) = (4 a) ) E^'W Q) ) ^ 



1 + a 



{a) ,-2/(l- a ) , (o) \a/(l-a) 



+o(||^ (Q) ||) . (14) 



In particular, observing that the second term of the right-hand side of (|14p vanishes at the 
Gibbs-Boltzmann distribution p, irrespective of the value of a, the linearization around the 
equilibrium point becomes, ignoring higher-order terms, 

6 ^{a) = ^(a)_ ( 15 ) 

Equation (115f) states that the matrix H^~ a ^ governs the local dynamics described in terms 
of a-representation in the vicinity of the equilibrium distribution p. It should be noted that 
the right-hand side of (j!4[) is in general not a projection of H^~°^ 5xj)^ onto the manifold of 
probability distributions in a-representation, defined as 

/1 \ 2 /(!-") 



3. Simulated annealing 

3.1. Relaxation in annealing 

With the inverse temperature (3 fixed, the distribution following the Markov chain relaxes 
toward the Gibbs-Boltzmann distribution. The basic idea behind simulated annealing is that by 
gradually reducing the temperature one can arrive at a distribution which concentrates on a set 
of minimum-energy states. Thus, by performing simulations of the Markov chain with a proper 
cooling schedule, one expects to obtain minimum-energy states with probability close to 1. One 
of the basic questions regarding simulated annealing is to determine the cooling schedule which 
guarantees convergence to minimum-energy states. 

We wish to study this problem via the linearized local dynamics in a-representation (|15p . 
with a = 0. Intuitively, our expectation is that if simulated annealing works well the distribution 
should stay very close to instantaneous equilibrium distributions as j3 is changed slowly enough. 
If it is the case, then arguments that are based on the local linear approximation around the 
equilibrium (|15p will be justified. Since the coefficient matrix H (°) is symmetric, all eigenvalues 
are real, so that the local dynamics around equilibrium is a simple linear relaxation toward 
the equilibrium, with negative eigenvalues of govern the speed of relaxation. In simulated 
annealing the instantaneous equilibium distribution is also slowly drifting as j3 changes. One 
can therefore expect to obtain a minimum-energy distribution only if the drift is slow enough 
so that the relaxation process is managed to catch up with the drift. What is important for 
successful convergence of simulated annealing is thus the largest negative eigenvalue of . 

3.2. Bound on largest negative eigenvalue 
We let 

M = (bl + X H^) N , (17) 
where % = e~@ d ' 2 /w ma3i , with d = max^j \Ei— Ej\ and w max = max^g^ Wij, is to make diagonal 
elements of nondiverging as f3 gets large, and where 

6 = l + max Y, — ( 18 ) 

^ k:(ki)£L WmaX 



is chosen so that bl+xH becomes a non- negative matrix. Irreducibility of the original Markov 
chain guarantees A4 to be a (strictly) positive matrix. 

The following theorem for positive matrices, due to Hopf [T5] in its operator form, is applied 
to obtain an upper bound of the largest negative eigenvalue. 

Theorem 1 Let A = {aij) be a square matrix that is positive, i.e., > holds for all i, j. 
Then the maximum eigenvalue Ao of A and any other eigenvalues A satisfy the inequality 

|A| < ^-A , (19) 

where 

k = max . (20) 

CLjk 

All positive elements of the matrix bl + xH^ are bounded from below by min{l, w m i n x}, 
where w m i n = min^g^ u>ij, and w m inX actually gives the lower bound for not too small values 
of p. A lower bound of the minimum element of A4 is thus (w m - m x) N ■ Alternatively, the matrix 
is upper bounded componentwise by the matrix (b— 1)1 + 11 T , where 11 T is an all-1 
matrix, so that an upper bound of the maximum element of A4 is given by (3^)^. An upper 
bound of the parameter k is therefore evaluated as 

/ 3N \ N 

k < . 21 

\W min Xj 

Note that symmetry of the matrix bl + xH^ , and hence of Ai , makes the argument of bounding 
K straightforward, thereby demonstrating efficiency of the O-representation. 

Let A be a negative eigenvalue of H^°\ Since we know that H^°> has a zero eigenvalue which 
is the largest, applying theorem 1 yields 

(b + X X) N <^b N , (22) 

K + 1 

and consequently, 

x< 2 b 2b(w min x) N b(w min x) N , , 

~ N(k + 1)~ N[(3N)N + (w miQ x) N ] ~ N(3N)" ' { } 

where we used the inequality 1 - [(« - 1)/(k + l)} 1 ^ > 2/[N(k + 1)] for k, N > 1. To make 
clear its dependence on (3, we rewrite it as 

' °"iV(3iV)^' 1 j 

where we have taken into account possible dependence of Wij on f3, by assuming that 

^min 



UK 



> e-^'/ 2 , d'>0 (25) 



holds. 





> |A|r 



Chaser 



Figure 1. A "chasing" view of simulated annealing. 



3.3. Simulated annealing as a chase of target 

From now on we assume the inverse temperature (3 to be a function of time t, and consider speed 
of drift of the instantaneous equilibrium distribution tp(°\ We have 



y(o) 2 



Cov(E)$ 2 < CV 



(26) 



where g is an energy gap between the lowest and the second lowest energies in {Ei] i £ S}, and 
where C > is a constant independent of (3. 

Now the problem is recast into the problem of "chasing" a drifting target (see figure [1]), whose 
velocity is no more than Ce~^ 9//2 /3. The speed of the chaser is no less than |A|r, where r is the 
"distance" between the chaser and the target, because the speed is determined by gradient- 
descent of a potential surface induced by . In view of the adiabatic theorem, which lays the 
basis of the quantum-mechanics-based analysis of simulated annealing [6J, we assume that r is 
small throughout the process, so that the local linear approximation of the dynamics is valid. 
We wish to obtain a sufficient condition for f3, as a function of time t, such that r tends to 
as t — > oo and (3 — ► oo. With a modest amount of foresight, we assume that r approaches zero 
as r ~ rot -7 with < 7 < 1. Since the speed of the chaser should be larger than that of the 
target, as a sufficient condition one has 



5roe -/3AW)/2 t -7 > Ce-^ 2 $. 



Solving it for (3, we obtain for large enough t, 

2(l- 7 ) 



N(d + d')-g 



logt + 0(l). 



(27) 



(28) 



For consistency, the difference of the speeds of the chaser and the target is equal to r, which 
should scale as t _7_1 , yielding 7 = g/[N(d + d')]. Collecting these results and ignoring non- 
dominant terms, one finally obtains 



I3~ l = T(t) > 



N(d + d!) 
21ogt 



(29) 



as a sufficient condition for simulated annealing to converge to a minimum-energy distribution. 



4. Stochastic matrix form decomposition 

The stochastic matrix form (SMF) decomposition, defined in [9], is a key to establishing 
the classical-to-quantum mapping. In this section, we briefly discuss relation between our 
formulation and the SMF decomposition. 

The SMF decomposition of H^ a ' is given by 



with 



H 



(a) 



(30) 



zr(«) _ P(E j -E x )/2 
ij 



(a) 



w/ a V.. _ P. 



+ e 



PiEt-Ej)^ 



(«) 



J' 



p. 



(31) 



where £jj is a matrix with (i, j) element being 1 and others being 0. Let H( a ' denote the matrix 

evaluated at the equilibrium distribution of the Markov chain, that is, = H^\ p= p. 
a = 0, it becomes 



H 



(o) 



F--A-F- - ffiiPi-Ei)l1f.. - p P{E J -E z )/2 F 
l 3 ' d % 11 



J.I- 



When 



(32) 



The matrix H^' is symmetric. 

The a-representation of the Gibbs-Boltzmann distribution, t/n a ), is an eigenvector of the 
matrix S\ ■ °^ with eigenvalue 0, that is, 



H 



(-a)^(a) 



(33) 



holds. This condition corresponds to the detailed-balance condition of the original formulation 
of the Markov chain. Note that it is consistent with the fact that is an eigenvector of the 
matrix H^~°^ with eigenvalue 0. 



5. Conclusion 

In this paper, we have discussed a classical reinterpretation of the quantum-mechanics- 
based analysis of classical simulated annealing [6], that is based on the quantum-classical 
correspondence [U [91 [10]. We have provided a reformulation of a Markov chain with detailed 
balance via the a-representation, as well as its local linear approximation of time evolution. It has 
been shown that the local linear approximation preserving the eigenvalues of the original Markov 
chain (equation (]15p ) is valid only in the vicinity of the equilibrium distribution. On the basis 
of the 0-representation-based reformulation, we have shown that the inverse-logarithm scaling 
of temperature in simulated annealing that guarantee optimality is successfully reproduced on 
the basis of our formulation, without recourse to quantum adiabatic theorem. 

We believe that usefulness of the a-representation of Markov chains with detailed balance 
goes well beyond just deriving the inverse-logarithm scaling, and hope that our reformulation 
helps shed light on the usefulness of the a-representation in more general context. 
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